[SPARK-55128][INFRA] Restore SQL tests by pin 'pandas==2.3.3'#53910
Closed
zhengruifeng wants to merge 2 commits intoapache:masterfrom
Closed
[SPARK-55128][INFRA] Restore SQL tests by pin 'pandas==2.3.3'#53910zhengruifeng wants to merge 2 commits intoapache:masterfrom
zhengruifeng wants to merge 2 commits intoapache:masterfrom
Conversation
JIRA Issue Information=== Test SPARK-55128 === This comment was automatically generated by GitHub Actions |
Contributor
Author
LuciferYang
reviewed
Jan 22, 2026
| if: (contains(matrix.modules, 'sql') && !contains(matrix.modules, 'sql-')) || contains(matrix.modules, 'connect') || contains(matrix.modules, 'yarn') | ||
| run: | | ||
| python3.11 -m pip install 'numpy>=1.22' pyarrow pandas pyyaml scipy unittest-xml-reporting 'lxml==4.9.4' 'grpcio==1.76.0' 'grpcio-status==1.76.0' 'protobuf==6.33.0' 'zstandard==0.25.0' | ||
| python3.11 -m pip install 'numpy>=1.22' pyarrow 'pandas==2.3.3' pyyaml scipy unittest-xml-reporting 'lxml==4.9.4' 'grpcio==1.76.0' 'grpcio-status==1.76.0' 'protobuf==6.33.0' 'zstandard==0.25.0' |
Contributor
There was a problem hiding this comment.
Is this the only place where the version needs to be pinned? For example, is it not necessary in other places like requirements.txt?
Contributor
Author
There was a problem hiding this comment.
file requirements.txt is only for developer, we may want to dev/test with pandas 3.0 in local env,
and current mlflow requires pandas<3
LuciferYang
approved these changes
Jan 22, 2026
zhengruifeng
added a commit
that referenced
this pull request
Jan 22, 2026
### What changes were proposed in this pull request? Restore Restore SQL tests by pin 'pandas<3' ### Why are the changes needed? pandas 3 is just released, and fail sql tests https://github.com/apache/spark/actions/runs/21232213791/job/61092886134 currently pandas 3 doesn't affect python tests too much: 1, in `dev/requirements.txt`, the latest `mlflow==3.8.1` requires: `pandas<3` 2, `pandas==2.3.3` is pinned in most places ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? ci ### Was this patch authored or co-authored using generative AI tooling? no Closes #53910 from zhengruifeng/restore_sql. Authored-by: Ruifeng Zheng <ruifengz@apache.org> Signed-off-by: Ruifeng Zheng <ruifengz@apache.org> (cherry picked from commit dafb2cd) Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
Contributor
Author
|
merged to master/4.1 |
zhengruifeng
pushed a commit
that referenced
this pull request
Jan 23, 2026
…3' for maven daily test ### What changes were proposed in this pull request? Similar to #53910, this pr pins the pandas version to 2.3.3. ### Why are the changes needed? To restore SQL tests for maven daily test. - https://github.com/apache/spark/actions/runs/21249870076/job/61148348328 ``` - udf/postgreSQL/udf-case.sql - Scalar Pandas UDF *** FAILED *** udf/postgreSQL/udf-case.sql - Scalar Pandas UDF Python: 3.11 Pandas: 3.0.0 PyArrow: 23.0.0 Expected Some("struct<Two:string,i:int,f:double,i:int,j:int>"), but got Some("struct<>") Schema did not match for query #30 SELECT '' AS `Two`, * FROM CASE_TBL a, CASE2_TBL b WHERE udf(COALESCE(f,b.i) = 2): -- !query SELECT '' AS `Two`, * FROM CASE_TBL a, CASE2_TBL b WHERE udf(COALESCE(f,b.i) = 2) -- !query schema struct<> -- !query output org.apache.spark.SparkRuntimeException { "errorClass" : "CAST_INVALID_INPUT", "sqlState" : "22018", "messageParameters" : { "ansiConfig" : "\"spark.sql.ansi.enabled\"", "expression" : "'nan'", "sourceType" : "\"STRING\"", "targetType" : "\"BOOLEAN\"" }, "queryContext" : [ { "objectType" : "", "objectName" : "", "startIndex" : 62, "stopIndex" : 85, "fragment" : "udf(COALESCE(f,b.i) = 2)" } ] } (SQLQueryTestSuite.scala:681) ``` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? monitor maven daily test after pr merged ### Was this patch authored or co-authored using generative AI tooling? No Closes #53933 from LuciferYang/SPARK-55128-FOLLOWUP. Authored-by: yangjie01 <yangjie01@baidu.com> Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
LuciferYang
added a commit
that referenced
this pull request
Jan 23, 2026
…3' for maven daily test ### What changes were proposed in this pull request? Similar to #53910, this pr pins the pandas version to 2.3.3. ### Why are the changes needed? To restore SQL tests for maven daily test. - https://github.com/apache/spark/actions/runs/21249870076/job/61148348328 ``` - udf/postgreSQL/udf-case.sql - Scalar Pandas UDF *** FAILED *** udf/postgreSQL/udf-case.sql - Scalar Pandas UDF Python: 3.11 Pandas: 3.0.0 PyArrow: 23.0.0 Expected Some("struct<Two:string,i:int,f:double,i:int,j:int>"), but got Some("struct<>") Schema did not match for query #30 SELECT '' AS `Two`, * FROM CASE_TBL a, CASE2_TBL b WHERE udf(COALESCE(f,b.i) = 2): -- !query SELECT '' AS `Two`, * FROM CASE_TBL a, CASE2_TBL b WHERE udf(COALESCE(f,b.i) = 2) -- !query schema struct<> -- !query output org.apache.spark.SparkRuntimeException { "errorClass" : "CAST_INVALID_INPUT", "sqlState" : "22018", "messageParameters" : { "ansiConfig" : "\"spark.sql.ansi.enabled\"", "expression" : "'nan'", "sourceType" : "\"STRING\"", "targetType" : "\"BOOLEAN\"" }, "queryContext" : [ { "objectType" : "", "objectName" : "", "startIndex" : 62, "stopIndex" : 85, "fragment" : "udf(COALESCE(f,b.i) = 2)" } ] } (SQLQueryTestSuite.scala:681) ``` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? monitor maven daily test after pr merged ### Was this patch authored or co-authored using generative AI tooling? No Closes #53933 from LuciferYang/SPARK-55128-FOLLOWUP. Authored-by: yangjie01 <yangjie01@baidu.com> Signed-off-by: Ruifeng Zheng <ruifengz@apache.org> (cherry picked from commit 3f1c9a3) Signed-off-by: yangjie01 <yangjie01@baidu.com>
Member
|
@zhengruifeng I guess this is also required for branch-4.0, it seems broken for a while
|
pan3793
pushed a commit
to pan3793/spark
that referenced
this pull request
Feb 11, 2026
Restore Restore SQL tests by pin 'pandas<3' pandas 3 is just released, and fail sql tests https://github.com/apache/spark/actions/runs/21232213791/job/61092886134 currently pandas 3 doesn't affect python tests too much: 1, in `dev/requirements.txt`, the latest `mlflow==3.8.1` requires: `pandas<3` 2, `pandas==2.3.3` is pinned in most places no ci no Closes apache#53910 from zhengruifeng/restore_sql. Authored-by: Ruifeng Zheng <ruifengz@apache.org> Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
pan3793
pushed a commit
to pan3793/spark
that referenced
this pull request
Feb 11, 2026
…3' for maven daily test Similar to apache#53910, this pr pins the pandas version to 2.3.3. To restore SQL tests for maven daily test. - https://github.com/apache/spark/actions/runs/21249870076/job/61148348328 ``` - udf/postgreSQL/udf-case.sql - Scalar Pandas UDF *** FAILED *** udf/postgreSQL/udf-case.sql - Scalar Pandas UDF Python: 3.11 Pandas: 3.0.0 PyArrow: 23.0.0 Expected Some("struct<Two:string,i:int,f:double,i:int,j:int>"), but got Some("struct<>") Schema did not match for query apache#30 SELECT '' AS `Two`, * FROM CASE_TBL a, CASE2_TBL b WHERE udf(COALESCE(f,b.i) = 2): -- !query SELECT '' AS `Two`, * FROM CASE_TBL a, CASE2_TBL b WHERE udf(COALESCE(f,b.i) = 2) -- !query schema struct<> -- !query output org.apache.spark.SparkRuntimeException { "errorClass" : "CAST_INVALID_INPUT", "sqlState" : "22018", "messageParameters" : { "ansiConfig" : "\"spark.sql.ansi.enabled\"", "expression" : "'nan'", "sourceType" : "\"STRING\"", "targetType" : "\"BOOLEAN\"" }, "queryContext" : [ { "objectType" : "", "objectName" : "", "startIndex" : 62, "stopIndex" : 85, "fragment" : "udf(COALESCE(f,b.i) = 2)" } ] } (SQLQueryTestSuite.scala:681) ``` No monitor maven daily test after pr merged No Closes apache#53933 from LuciferYang/SPARK-55128-FOLLOWUP. Authored-by: yangjie01 <yangjie01@baidu.com> Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
pan3793
pushed a commit
that referenced
this pull request
Feb 11, 2026
Backport #53910 and #53933 to branch-4.0 ### What changes were proposed in this pull request? Pandas 3.0 released, pin 'pandas==2.3.3' to recover the CI. ### Why are the changes needed? Recover CI. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Wait for GHA result. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #54263 from pan3793/SPARK-55128-4.0. Lead-authored-by: Ruifeng Zheng <ruifengz@apache.org> Co-authored-by: yangjie01 <yangjie01@baidu.com> Signed-off-by: Cheng Pan <chengpan@apache.org>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

What changes were proposed in this pull request?
Restore Restore SQL tests by pin 'pandas<3'
Why are the changes needed?
pandas 3 is just released, and fail sql tests
https://github.com/apache/spark/actions/runs/21232213791/job/61092886134
currently pandas 3 doesn't affect python tests too much:
1, in
dev/requirements.txt, the latestmlflow==3.8.1requires:pandas<32,
pandas==2.3.3is pinned in most placesDoes this PR introduce any user-facing change?
no
How was this patch tested?
ci
Was this patch authored or co-authored using generative AI tooling?
no